A Hybrid Implementation of K-Means and HAC Algorithm and Its Comparison with other Clustering Algorithms
نویسندگان
چکیده
There is a huge amount of data which is being produced everyday in Information Technology industry but it is of no use until converted into useful information. Data mining is defined as the process of extracting of hidden predictive information from large databases. Data mining provides an easy and timesaving concept to extract the useful information from large database instead of going through the whole database. There are various data mining techniques and clustering is one of them. Clustering algorithms especially draws significant attention of researchers all around the world because it makes an easy availability of the same data in form of clusters. There are various types of clustering algorithms available in the literature, with each algorithm having its own pro and cons. In this research paper, a hybrid implementation of k-Means and HAC clustering algorithm is presented. Also, the hybrid approach is compared with four other clustering algorithm namely k-Means, DT, HAC, VARCHA. The hybrid implementation has been done using Python scripting language and SCIKIT LEARN open source tool was used for the performance comparison of the algorithms. The various parameters used for comparison were accuracy, precision, recall and f-score. The results show that the performance of hybrid algorithm is found to be quite better than the existing ones.
منابع مشابه
A Hybrid Data Clustering Algorithm Using Modified Krill Herd Algorithm and K-MEANS
Data clustering is the process of partitioning a set of data objects into meaning clusters or groups. Due to the vast usage of clustering algorithms in many fields, a lot of research is still going on to find the best and efficient clustering algorithm. K-means is simple and easy to implement, but it suffers from initialization of cluster center and hence trapped in local optimum. In this paper...
متن کاملImproved COA with Chaotic Initialization and Intelligent Migration for Data Clustering
A well-known clustering algorithm is K-means. This algorithm, besides advantages such as high speed and ease of employment, suffers from the problem of local optima. In order to overcome this problem, a lot of studies have been done in clustering. This paper presents a hybrid Extended Cuckoo Optimization Algorithm (ECOA) and K-means (K), which is called ECOA-K. The COA algorithm has advantages ...
متن کاملTabu-KM: A Hybrid Clustering Algorithm Based on Tabu Search Approach
The clustering problem under the criterion of minimum sum of squares is a non-convex and non-linear program, which possesses many locally optimal values, resulting that its solution often falls into these trap and therefore cannot converge to global optima solution. In this paper, an efficient hybrid optimization algorithm is developed for solving this problem, called Tabu-KM. It gathers the ...
متن کاملData Clustring Using A New CGA(Chaotic-Generic Algorithm) Approach
Clustering is the process of dividing a set of input data into a number of subgroups. The members of each subgroup are similar to each other but different from members of other subgroups. The genetic algorithm has enjoyed many applications in clustering data. One of these applications is the clustering of images. The problem with the earlier methods used in clustering images was in selecting in...
متن کاملData Clustring Using A New CGA(Chaotic-Generic Algorithm) Approach
Clustering is the process of dividing a set of input data into a number of subgroups. The members of each subgroup are similar to each other but different from members of other subgroups. The genetic algorithm has enjoyed many applications in clustering data. One of these applications is the clustering of images. The problem with the earlier methods used in clustering images was in selecting in...
متن کامل